1. Homepage

Figure 1 displays the homepage of konnect2Prot 2.0, accessible at https://konnect2prot_v2.thsti.in/. Clicking the "Start Here" icon directs users to the main application page, as illustrated in Figure 1(A). Selecting the "Contact Us" option, shown in Figure 1(B), opens a contact form where users can submit their queries related to konnect2prot 2.0.

Homepage

Figure 1:konnect2prot 2.0 homepage: (A) The "Start Here" button highlighted in the red box directs users to the main application page of konnect2Prot 2.0. (B) The "Contact Us" button, also marked with a red box, opens the contact form where users can directly submit their queries to the team.

2. Dashboard

Figure 2 illustrates the konnect2prot 2.0 dashboard, which serves as the initial window upon accessing the homepage. The sidebar includes a user-friendly instruction set for guidance, as shown in Figure 2(A). At the top navigation bar, four main tabs—Data, EDA (Exploratory Data Analysis), Visualization, and Network Analysis—are available, as depicted in Figure 2(B). Data uploads are handled through the Data tab. An overview of the portal's workflow is presented in a flowchart, shown in Figure 2(C). Users with gene expression data should proceed via the Data tab, those with DEG files in the required format can continue with the Visualization tab, and users with only gene names or UniProt IDs should navigate to the Network tab.

Dashboard

Figure 2:The konnect2prot 2.0 application page includes the following features: (A) Basic instructions are provided for each tab—Data, EDA, Visualization, and Network—to help users understand their functions. (B) The navigation bar displays all four tabs, allowing users to easily access different sections of the portal. (C) A comprehensive flowchart illustrates the overall workflow of the konnect2Prot 2.0 application.

3. Data Tab

1. In the data tab, there will be an upload sidebar where you can upload your file either by dragging it or by clicking the Drag and Drop files here tab, as shown in Figure 3.
2. The file will automatically open and show the data preview at the right side of upload tab.
3. A preliminary analysis of the uploaded data will be conducted.
4. The results will be shown in graphical forms, including a Box Plot, Mean-Variance Trend, Density plot, and QQ plot, as shown in Figures 4 and 5.

Dashboard

Figure 3: Data Tab of konnect2prot 2.0: (A) Users can upload gene expression files in TSV, XLSX, or CSV format, with a recommended file size of under 100 MB. (B) The Data tab includes several panels such as Data Upload Preview, Box Plot of Gene Expression, Mean-Variance Trend, Density Plot, and QQ Plot for initial data assessment.
Dashboard

Figure 4:Upon uploading a file by clicking “Click to Import,” the uploaded data is instantly displayed in the Data Table section as denoted by the arrow.
Dashboard

Figure 5:Plots generated under the Data tab include: (A) Box Plot showing gene expression distribution across samples. (B) Mean-Variance Trend Plot versus average log-expression. (C) Density Plot displaying gene expression distribution for each sample. (D) QQ Plot representing the quantile distribution for each sample.

4. EDA

1. As shown in fig 6 there is a pre-processing Parameters side tab where you can select Normalization method, Scaling Method, Alpha Value (Significance Threshold). 2. The normalization method includes log2, log10, and None.
3. The scaling method includes Min-Max, Standard Scaling or None.
4. Alpha value as per your experiment needs.

Dashboard

Figure 6:Exploratory Data Analysis tab of konnect2prot 2.0: The red box highlights the pre-processing parameters, including normalization methods (log2, log10) and scaling methods (Standard Scaling, Min-Max Scaling).

5. Group Naming includes an input box for defining the names of groups in a binary fashion. You have to select only for Set 1, and Set 2 will be automatically selected. For example, the data can be grouped into CASE and CONTROL, as shown in Figure 7.
6. After clicking on Submit the DEGs Table will be generated alongside with Summary of Sample and DEG as shown in Figure 8.

Dashboard

Figure 7:The red box highlights the "Define Group" section, where users can assign group names for two sets. Users are required to select columns for Set 1, while columns for Set 2 are automatically identified. The red arrow indicates the "Submit" button to proceed.
Dashboard

Figure 8: Under the EDA tab, the red box displays the qualitative exploration section, which includes a sample summary comparing the two sets, along with DEG summary and counts of significant genes.

7.User can also explore 'Pathway Explorarion' tab to see desired pathway containing DEGs, 'Complex Exploration' tab to see desired complex containing DEGs, 'Similarity Exploration' tab to explore the expression similarity of selected gene in the selected group, 'Expression Distribution'(Upto 5 cross validation) tab to see the expression profile of the selected gene(s) in both groups by box plot representation. As shown in Figure 9.

Dashboard

Figure 9: (A - B) Show how gene alterations affect both pathway analysis and protein complex analysis. (C - D) Allow users to view gene similarity comparisons and create boxplots reflecting the expression distribution for a chosen genes.

5. Visualization

1. In this section, there will be a Volcano plot, pathway enrichment, PCA of complete Gene Expressions, and Significant Gene Expression, as shown in Figures 10 (A) and 10 (B).
2. In the volcano plot, the slider is used to change the Log2 fold change value.

Dashboard

Figure 10(A):Visualization tab of konnect2prot 2.0: The slider in the red box in the sidebar allows users to set the log2(Fold Change) threshold and choose the enrichment category—Pathway, Process, or Disease. The left panel displays the Volcano Plot of DEGs, while the right panel shows the Top 10 enrichment results.
Dashboard

Figure 10(B): In the Visualization tab, user can perform both the PCA of all features and the PCA of selected features are displayed for comparative analysis.

4. Users can upload a DEG file in the specified format (as shown in Figure 11) to visualize the volcano plot and perform pathway enrichment analysis.
5. However, for generating PCA plots, a complete gene expression file is required.
6.Once the analysis is complete, users can proceed to the Network tab for further exploration.

Dashboard

Figure 11: As shown in the highlighted box, users have the option to directly upload Differentially Expressed Gene (DEG) files by following the specified format. This required format is clearly illustrated in the right panel, which serves as a visual reference to ensure correct file structure and column organization for successful upload and processing.

6. Network Analysis

Here, we have searched k2p using an example gene, "CDK1". The protein-protein interaction (PPI) network of CDK1 and its first neighbours will be constructed at the right-hand side panel. This network can further be filtered using "localization", "molecular functions", "biological processes", "tissue-specificity" or "pathways". An example is shown in Figure 12. For a smooth visualisation, k2p provides different layout options, which can be found in the layout tab, as shown in Figure 13. Click the analysis button to find the enriched pathways and ontologies, multi-disease interactome, and topological analysis.

Dashboard
Figure 12: The Network Analysis tab allows users to input gene symbols or UniProt IDs directly into the text box, or fetch significant genes by clicking the button indicated by the red arrow. The red box displays the visualized protein interaction network.
Dashboard

Figure 13: Information such as PDB structures, ligands, interaction types, disease mutations, and residue details is displayed under the Local Properties section—highlighted by the red box—when a node or edge is clicked.

You may find out how many PDB structures are available for a protein by clicking on it in the created PPI network. The ligand panel provides information on small compounds and their mode of action for the query protein. The "mutation in disease" panel contains information on disease-specific mutations of this protein (if any). The information about the mode of interaction can be accessed by clicking an edge in the network. An example is shown in Figure 13.

a. Enrichment panel

By clicking the "analysis" button, the enrichment panel will display the enriched pathways and processes for the proteins in the constructed PPI network. Additionally, k2p also provides the protein class abundance and the multi-disease landscape of the proteins in the PPI network. This information is shown in Figure 14.

Dashboard

Figure 14: The Enrichment panel displays information on enriched pathways, processes, the multi-disease landscape, and protein class abundance.

b. Topological panel

The topological panel (see Figure 15) illustrates the results of three critical measures of centrality: degree, betweenness, and closeness centrality. A plot of degree versus betweenness plot is also included to identify the proteins that act as hubs and bottlenecks in the constructed PPI network. For a detailed understanding of the different centrality measures and their application please refer to [1].

c. Spreaders

We have identified the influential spreaders in the network and augmented it with other auxiliary information. Identifying a set of influential spreaders in complex networks plays a crucial role in effective information spreading, which is identified using the voterank algorithm [2].This algorithm determines influential nodes in a network based on an iterative voting mechanism. Each node votes for its immediate neighbors, and the node receiving the highest number of votes is selected as a spreader. After selection, the voting power of the neighbors of that node is reduced to prevent redundancy. This process continues until the desired number of spreaders is identified. For details, please see the [3]. In this approach, all nodes vote in a spreader in each turn, and the voting ability of neighbours of the elected spreader will be decreased in subsequent turns. The identified triggers could be explored during the investigation for various applications, such as potential drug targets. As illustrated in Figure 16, k2p identifies the triggers in the PPI network and files their topological properties, cellular localisation, class, available PDB complexes and ligands. Afterwards, a clustergram of pathways related to the spreaders is shown to give an idea of which pathways are modified by the network's top spreaders. This cluster gram and the high tissue specificity of these influential spreaders can be exported in .png format. The spreaders can be targets or triggers, depending on the context of the study.

Dashboard

Figure 15: The topological panel. You can use the designated buttons to search for a protein, navigate through the results, and even sort the analysis results.
Dashboard

Figure 16:The results for the top spreaders display various attributes including Name, Degree, InDegree, OutDegree, Betweenness, Clustering, Closeness, Protein Class, Location, Inhibitors, and PDB Complex Count.

d. Spreader-Hallmark associations

Protein-Hallmark associations are another crucial property of k2p. Every disease is driven by specific characteristics or hallmarks. In Konnect2Prot v2, hallmarks refer to key pathological traits associated with diseases, derived from resources such as CancerGeneNet. For example, in cancer studies, hallmarks represent essential processes such as sustaining proliferative signalling, evading growth suppressors, etc., as described by [4]. Identifying proteins associated with the hallmarks helps identify new therapeutic targets with more specific pharmacological activity. Various drugs are deliberately developed for specific molecular targets that involve these hallmarks [5]. Addressing this, k2p incorporates two crucial aspects of drug discovery: protein-hallmark associations and protein-signalling pathway associations, Figure 18. The latter will enable the identification of not just intra-pathway deregulation but also the interdependence of pathways. Again, this information can be utilized to deduce the pleiotropic effects of a large number of genes on distinct pathways that contribute to the development of specific disease characteristics or traits. A directed bipartite graph illustrating hallmark signalling is presented in Figure 18(A), where black dots represent spreader genes and black dots indicate cancer hallmarks. Another directed bipartite graph maps spreader genes, shown as black dots, and their corresponding targets, marked by red dots, to the associated signalling pathways as illustrated in Figure 18(B).

Dashboard

Figure 17: Pathway clustergram of spreaders: konnect2prot 2.0 creates a clustergram that visualizes the extent of cross-talk between different pathways impacted by the spreaders.
Dashboard

Figure 18: To understand the link between spreaders and disease (A) Directional bipartite network illustrating the association between spreader genes and cancer hallmarks, providing insight into disease pathophysiology. (B) Directional bipartite network mapping of spreader genes and its association with signalling pathways.

References